17 research outputs found

    TrIMS: Transparent and Isolated Model Sharing for Low Latency Deep LearningInference in Function as a Service Environments

    Full text link
    Deep neural networks (DNNs) have become core computation components within low latency Function as a Service (FaaS) prediction pipelines: including image recognition, object detection, natural language processing, speech synthesis, and personalized recommendation pipelines. Cloud computing, as the de-facto backbone of modern computing infrastructure for both enterprise and consumer applications, has to be able to handle user-defined pipelines of diverse DNN inference workloads while maintaining isolation and latency guarantees, and minimizing resource waste. The current solution for guaranteeing isolation within FaaS is suboptimal -- suffering from "cold start" latency. A major cause of such inefficiency is the need to move large amount of model data within and across servers. We propose TrIMS as a novel solution to address these issues. Our proposed solution consists of a persistent model store across the GPU, CPU, local storage, and cloud storage hierarchy, an efficient resource management layer that provides isolation, and a succinct set of application APIs and container technologies for easy and transparent integration with FaaS, Deep Learning (DL) frameworks, and user code. We demonstrate our solution by interfacing TrIMS with the Apache MXNet framework and demonstrate up to 24x speedup in latency for image classification models and up to 210x speedup for large models. We achieve up to 8x system throughput improvement.Comment: In Proceedings CLOUD 201

    Herzchirurgie bei Zeugen Jehovas

    Full text link
    Die Herzchirurgie bei Zeugen Jehovas stellt aufgrund der Ablehnung von Bluttransfusion eine medizinische und ethische Herausforderung dar. Die vorliegende Studie untersuchte 42 Patienten der Glaubensrichtung Zeugen Jehovas die sich einer Herzoperation mit Einsatz der Herzlungenmaschine unterzogen, im Hinblick auf Komplikationsraten, Sterblichkeit und Möglichkeiten der Alternativverfahren zur Anhebung der Hämoglobinkonzentration. Hier wurden perioperativ Erythropoetin und Eisenpräparate verabreicht. Der Verlauf des Hämoglobins war gekennzeichnet von einer ausgeprägten Hämodilution am Ende der Herzlungenmaschine und einem signifikanten Anstieg am 1. postoperativen Tag. Hier zeigte sich der Schwellenwert von 7.5 g/dl als richtungweisend. Die Intensiv- und Krankenhausdauer sowie das Auftreten eines Durchgangssyndroms und die Respiration korrelierten signifikant mit dem Hämoglobinwert. Durch die Jurisprudenz werden rechtliche Aspekte und Konflikte der ärztlichen Behandlung klar geregelt

    Accelerating Reduction and Scan Using Tensor Core Units

    Full text link
    Driven by deep learning, there has been a surge of specialized processors for matrix multiplication, referred to as TensorCore Units (TCUs). These TCUs are capable of performing matrix multiplications on small matrices (usually 4x4 or 16x16) to accelerate the convolutional and recurrent neural networks in deep learning workloads. In this paper we leverage NVIDIA's TCU to express both reduction and scan with matrix multiplication and show the benefits -- in terms of program simplicity, efficiency, and performance. Our algorithm exercises the NVIDIA TCUs which would otherwise be idle, achieves 89%-98% of peak memory copy bandwidth, and is orders of magnitude faster (up to 100x for reduction and 3x for scan) than state-of-the-art methods for small segment sizes -- common in machine learning and scientific applications. Our algorithm achieves this while decreasing the power consumption by up to 22% for reduction and16%for scan.Comment: In Proceedings of the ACM International Conference on Supercomputing (ICS '19

    Compiling high-level scripting languages to performant code

    No full text
    The popularity of data- and scientific-oriented applications, the abundance of on-demand compute resources, and the scarcity of domain expert programmers have given rise to high- level scripting languages. These high-level scripting languages offer a fast way to translate ideas into code, but tend to incur a heavy performance overhead. To alleviate the performance penalty, each implementation of these languages often offer a compilation path to a subset of the language. In this thesis, we present the design and implementation of the Wolfram Language compiler, the production compiler for the Wolfram Language. We show how popular language features and runtime behavior, expected by Wolfram Language developers, are efficiently implemented within the compiler. We then show how the compiler provides a friction-less path to migrate programs from the interpreter to the compiler. We evaluate the compiler and show that the compiled code matches the performance of a highly tuned hand-written C code. Unlike existing techniques that compile a subset of the language, the compiler sup- ports the entirety of the Wolfram Language. We show why the compiler is a new model of development for programmers and showcase some applications of the compiler. The compiler has been released as a prominent feature of the Wolfram Engine, is readily available to developers, and is used by internal and external users to drive Wolfram Language features and implementations.LimitedAuthor requested closed access (OA after 2yrs) in Vireo ETD syste
    corecore